Search CORE

43 research outputs found

Refined Complexity of PCA with Outliers

Author: Fomin Fedor V.
Golovach Petr A.
Panolan Fahad
Simonov Kirill
Publication venue
Publication date: 01/01/2019
Field of study

Principal component analysis (PCA) is one of the most fundamental procedures in exploratory data analysis and is the basic step in applications ranging from quantitative finance and bioinformatics to image analysis and neuroscience. However, it is well-documented that the applicability of PCA in many real scenarios could be constrained by an "immune deficiency" to outliers such as corrupted observations. We consider the following algorithmic question about the PCA with outliers. For a set of

n

points in

\mathbb{R}^{d}

, how to learn a subset of points, say 1% of the total number of points, such that the remaining part of the points is best fit into some unknown

r

-dimensional subspace? We provide a rigorous algorithmic analysis of the problem. We show that the problem is solvable in time

n^{O(d^2)}

. In particular, for constant dimension the problem is solvable in polynomial time. We complement the algorithmic result by the lower bound, showing that unless Exponential Time Hypothesis fails, in time

f(d)n^{o(d)}

, for any function

f

d

, it is impossible not only to solve the problem exactly but even to approximate it within a constant factor.Comment: To be presented at ICML 201

arXiv.org e-Print Archive

University of Bergen

NORA - Norwegian Open Research Archives

Parameterized complexity of PCA

Author: Fomin Fedor
Golovach Petr
Simonov Kirill
Publication venue: Dagstuhl publishing
Publication date: 01/01/2020
Field of study

We discuss some recent progress in the study of Principal Component Analysis (PCA) from the perspective of Parameterized Complexity.publishedVersio

University of Bergen

Dagstuhl Research Online Publication Server

NORA - Norwegian Open Research Archives

Consistency-Checking Problems: A Gateway to Parameterized Sample Complexity

Author: Ganian Robert
Khazaliya Liana
Simonov Kirill
Publication venue
Publication date: 22/08/2023
Field of study

Recently, Brand, Ganian and Simonov introduced a parameterized refinement of the classical PAC-learning sample complexity framework. A crucial outcome of their investigation is that for a very wide range of learning problems, there is a direct and provable correspondence between fixed-parameter PAC-learnability (in the sample complexity setting) and the fixed-parameter tractability of a corresponding "consistency checking" search problem (in the setting of computational complexity). The latter can be seen as generalizations of classical search problems where instead of receiving a single instance, one receives multiple yes- and no-examples and is tasked with finding a solution which is consistent with the provided examples. Apart from a few initial results, consistency checking problems are almost entirely unexplored from a parameterized complexity perspective. In this article, we provide an overview of these problems and their connection to parameterized sample complexity, with the primary aim of facilitating further research in this direction. Afterwards, we establish the fixed-parameter (in)-tractability for some of the arguably most natural consistency checking problems on graphs, and show that their complexity-theoretic behavior is surprisingly very different from that of classical decision problems. Our new results cover consistency checking variants of problems as diverse as (k-)Path, Matching, 2-Coloring, Independent Set and Dominating Set, among others

arXiv.org e-Print Archive

Parameterized k-Clustering: Tractability Island

Author: Fomin Fedor
Golovach Petr
Simonov Kirill
Publication venue: Dagstuhl Publishing
Publication date: 17/01/2020
Field of study

In k-Clustering we are given a multiset of n vectors X subset Z^d and a nonnegative number D, and we need to decide whether X can be partitioned into k clusters C_1, ..., C_k such that the cost sum_{i=1}^k min_{c_i in R^d} sum_{x in C_i} |x-c_i|_p^p <= D, where |*|_p is the Minkowski (L_p) norm of order p. For p=1, k-Clustering is the well-known k-Median. For p=2, the case of the Euclidean distance, k-Clustering is k-Means. We study k-Clustering from the perspective of parameterized complexity. The problem is known to be NP-hard for k=2 and it is also NP-hard for d=2. It is a long-standing open question, whether the problem is fixed-parameter tractable (FPT) for the combined parameter d+k. In this paper, we focus on the parameterization by D. We complement the known negative results by showing that for p=0 and p=infty, k-Clustering is W1-hard when parameterized by D. Interestingly, the complexity landscape of the problem appears to be more intricate than expected. We discover a tractability island of k-Clustering: for every p in (0,1], k-Clustering is solvable in time 2^O(D log D) (nd)^O(1).publishedVersio

University of Bergen

On Coresets for Fair Clustering in Metric and Euclidean Spaces and Their Applications

Author: Bandyapadhyay Sayan
Fomin Fedor V.
Simonov Kirill
Publication venue: PDXScholar
Publication date: 01/06/2024
Field of study

Fair clustering is a constrained clustering problem where we need to partition a set of colored points. The fraction of points of each color in every cluster should be more or less equal to the fraction of points of this color in the dataset. The problem was recently introduced by Chierichetti et al. (2017) [1]. We propose a new construction of coresets for fair clustering for Euclidean and general metrics based on random sampling. For the Euclidean space Rd, we provide the first coreset whose size does not depend exponentially on the dimension d. The question of whether such constructions exist was asked by Schmidt et al. (2019) [2]and Huang et al. (2019) [5]. For general metrics, our construction provides the first coreset for fair clustering. New coresets appear to be a handy tool for designing better approximation and streaming algorithms for fair and other constrained clustering variants

PDXScholar (Portland State University)

Building Large k-Cores from Sparse Graphs

Author: Fomin Fedor V.
Sagunov Danil
Simonov Kirill
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 45th International Symposium on Mathematical Foundations of Computer Science (MFCS 2020)
Publication date: 01/01/2020
Field of study

A popular model to measure network stability is the k-core, that is the maximal induced subgraph in which every vertex has degree at least k. For example, k-cores are commonly used to model the unraveling phenomena in social networks. In this model, users having less than k connections within the network leave it, so the remaining users form exactly the k-core. In this paper we study the question of whether it is possible to make the network more robust by spending only a limited amount of resources on new connections. A mathematical model for the k-core construction problem is the following Edge k-Core optimization problem. We are given a graph G and integers k, b and p. The task is to ensure that the k-core of G has at least p vertices by adding at most b edges. The previous studies on Edge k-Core demonstrate that the problem is computationally challenging. In particular, it is NP-hard when k = 3, W[1]-hard when parameterized by k+b+p (Chitnis and Talmon, 2018), and APX-hard (Zhou et al, 2019). Nevertheless, we show that there are efficient algorithms with provable guarantee when the k-core has to be constructed from a sparse graph with some additional structural properties. Our results are - When the input graph is a forest, Edge k-Core is solvable in polynomial time; - Edge k-Core is fixed-parameter tractable (FPT) when parameterized by the minimum size of a vertex cover in the input graph. On the other hand, with such parameterization, the problem does not admit a polynomial kernel subject to a widely-believed assumption from complexity theory; - Edge k-Core is FPT parameterized by the treewidth of the graph plus k. This improves upon a result of Chitnis and Talmon by not requiring b to be small. Each of our algorithms is built upon a new graph-theoretical result interesting in its own

arXiv.org e-Print Archive

University of Bergen

Dagstuhl Research Online Publication Server

NORA - Norwegian Open Research Archives

On Coresets for Fair Clustering in Metric and Euclidean Spaces and Their Applications

Author: Bandyapadhyay Sayan
Fomin Fedor V.
Simonov Kirill
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)
Publication date: 20/07/2020
Field of study

Fair clustering is a constrained variant of clustering where the goal is to partition a set of colored points, such that the fraction of points of any color in every cluster is more or less equal to the fraction of points of this color in the dataset. This variant was recently introduced by Chierichetti et al. [NeurIPS, 2017] in a seminal work and became widely popular in the clustering literature. In this paper, we propose a new construction of coresets for fair clustering based on random sampling. The new construction allows us to obtain the first coreset for fair clustering in general metric spaces. For Euclidean spaces, we obtain the first coreset whose size does not depend exponentially on the dimension. Our coreset results solve open questions proposed by Schmidt et al. [WAOA, 2019] and Huang et al. [NeurIPS, 2019]. The new coreset construction helps to design several new approximation and streaming algorithms. In particular, we obtain the first true constant-approximation algorithm for metric fair clustering, whose running time is fixed-parameter tractable (FPT). In the Euclidean case, we derive the first

(1+\epsilon)

-approximation algorithm for fair clustering whose time complexity is near-linear and does not depend exponentially on the dimension of the space. Besides, our coreset construction scheme is fairly general and gives rise to coresets for a wide range of constrained clustering problems. This leads to improved constant-approximations for these problems in general metrics and near-linear time

(1+\epsilon)

-approximations in the Euclidean metric

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Parameterized k-Clustering: Tractability Island

Author: Fomin Fedor V.
Golovach Petr A.
Simonov Kirill
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 39th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2019)
Publication date: 01/01/2019
Field of study

University of Bergen

Dagstuhl Research Online Publication Server

NORA - Norwegian Open Research Archives

Manipulating Districts to Win Elections: Fine-Grained Complexity

Author: Eiben Eduard
Fomin Fedor V.
Panolan Fahad
Simonov Kirill
Publication venue
Publication date: 18/02/2020
Field of study

Gerrymandering is a practice of manipulating district boundaries and locations in order to achieve a political advantage for a particular party. Lewenberg, Lev, and Rosenschein [AAMAS 2017] initiated the algorithmic study of a geographically-based manipulation problem, where voters must vote at the ballot box closest to them. In this variant of gerrymandering, for a given set of possible locations of ballot boxes and known political preferences of

n

voters, the task is to identify locations for

k

boxes out of

m

possible locations to guarantee victory of a certain party in at least

l

districts. Here integers

k

and

l

are some selected parameter. It is known that the problem is NP-complete already for 4 political parties and prior to our work only heuristic algorithms for this problem were developed. We initiate the rigorous study of the gerrymandering problem from the perspectives of parameterized and fine-grained complexity and provide asymptotically matching lower and upper bounds on its computational complexity. We prove that the problem is W[1]-hard parameterized by